-
Notifications
You must be signed in to change notification settings - Fork 144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: add tutorials of fine-tune on a custom dataset #711
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very comprehensive:)
|
||
### Read Custom Dataset | ||
|
||
For custom datasets, you can either organize the dataset file directory locally into a tree structure similar to ImageNet, and then use the function `create_dataset` to read the dataset (offline way), or directly read all the images into an iterable object, replacing the file splitting and the `create_dataset` steps (online way). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bit confusing on why they are named as offline/online. It should more clear to directly use the name: ImageFolderDataset, GeneratorDataset
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the structure of downstream datasets can be diverse, to be able to handle this diversity, the following two approaches are common practice:
Offline processing: Manually reorganize data into a standard format for loading by existing interfaces. The keyword offline is reflected in the process of manually formatting the dataset.
Online processing: Building new interfaces to be able to load datasets in specific formats. The keyword online is reflected in the fact that the newly constructed interface is parsing the data structure during runtime.
for param in network.trainable_params(): | ||
if param.name not in classifier_names: | ||
param.requires_grad = False | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be better to wrap these code of freezing backbone into a function, allowing easy integration for other datasets/tasks.
def freeze_backbone(net, cfg):
...
return net
Thank you for your contribution to the MindCV repo.
Before submitting this PR, please make sure:
Motivation
Add tutorials of fine-tuning on a custom dataset and other relevant codes.
Test Plan
(How should this PR be tested? Do you require special setup to run the test or repro the fixed bug?)
Related Issues and PRs
(Is this PR part of a group of changes? Link the other relevant PRs and Issues here. Use https://help.github.com/en/articles/closing-issues-using-keywords for help on GitHub syntax)